A correlated topic model of Science

نویسندگان

  • David M. Blei
  • John D. Lafferty
  • DAVID M. BLEI
  • JOHN D. LAFFERTY
چکیده

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than X-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [J. Roy. Statist. Soc. Ser. B 44 (1982) 139–177]. We derive a fast variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. We apply the CTM to the articles from Science published from 1990–1999, a data set that comprises 57M words. The CTM gives a better fit of the data than LDA, and we demonstrate its use as an exploratory tool of large document collections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Optimal SVD-based Precoding for Secret Key Extraction from Correlated OFDM Sub-Channels

Secret key extraction is a crucial issue in physical layer security and a less complex and, at the same time, a more robust scheme for the next generation of 5G and beyond. Unlike previous works on this topic, in which Orthogonal Frequency Division Multiplexing (OFDM) sub-channels were considered to be independent, the effect of correlation between sub-channels on the secret key rate is address...

متن کامل

Phase II monitoring of auto-correlated linear profiles using linear mixed model

In many circumstances, the quality of a process or product is best characterized by a given mathematical function between a response variable and one or more explanatory variables that is typically referred to as profile. There are some investigations to monitor auto-correlated linear and nonlinear profiles in recent years. In the present paper, we use the linear mixed models to account autocor...

متن کامل

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

The Evaluation of a causal model of wisdom based on big five Factors of personality with the mediating role of basic psychological needs in university students

The aim of this study was to evaluate the causal model for wisdom based on the five factors of personality with the mediating role of psychological basic needs in students. the method of the present study was descriptive-correlational using a structural equation model. The statistical population of this study was undergraduate and graduate students of the academic year 2020-2021 in Golestan pro...

متن کامل

Wavelet Threshold Estimator of Semiparametric Regression Function with Correlated Errors

Wavelet analysis is one of the useful techniques in mathematics which is used much in statistics science recently. In this paper, in addition to introduce the wavelet transformation, the wavelet threshold estimation of semiparametric regression model with correlated errors with having Gaussian distribution is determined and the convergence ratio of estimator computed. To evaluate the wavelet th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007